--- title: COVID-19 in Oregon author: Robert W. Walker date: '2020-04-05' ---

Oregon COVID data

I wanted to create a self-updating visualization of the data on COVID-19 in the state of Oregon provided by OHA. There is a separate page of daily maps. Oregon reports a set of daily snapshots while progression requires ingesting new data each day so I began tracking it March 20; the process of scraping it is detailed in a separate file. This updates, expands, and consolidates two previous posts here and here. A few cautionary notes. First, these are cumulative case counts that are almost certainly incomplete. Second, there is reason to believe that the death numbers may also be incomplete because it is dangerous to police and first responders are spread thinly.

These data are current as of April 5, 2020; this first bit of R code loads the current data from my github.

library(tigris)
library(rgdal)
library(htmltools)
library(viridis) 
library(sf)
library(ggrepel)
library(ggthemes)
library(gganimate)
library(patchwork)
library(hrbrthemes)
load(url(paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData")))

Verifying, these data are current as of 2020-04-05 according to the loaded dataset.

A running commentary on the data

The central takeaway is that the underlying data reports are constantly changing over time. They are reporting new and different quantities and this destroys the comparability of data. Some of it can be reconstructed but much of it is only useful going forward and keeping the schema constant. I hope that will happen at some point.

2020-03-23 and before It started with only three tables. Then four, then five, adding footnotes and rearranging the data structures became common. Until the data structures solidify, maintaining comparability, and feeding appropriate data to the visualization tool continue to pose challenges and require a fair amount of tweaking.

2020-03-30 Footnotes for some entries appear newly today. That’s irritating because it changes the variable names.

2020-03-31 The status of COVID patients and available hospital details were previously in one table but now were split into two parts. The underlying auto-scraper required changes.

2020-04-02 The status of COVID patients and available hospital details are pending in the report. I have updated the rest. This remains true as of April 5, 2020.

A base map

To build a map to work from, I need a map library. Load the tigris library then grab the map as an sf object; there is a geom_sf that makes them easy to work with. Finally, I join the map to the data and build a tool tip to display.

counties.t <- counties(state = "41", resolution = "500k", class="sf")
Map.Me <- left_join(counties.t,Oregon.COVID, by=c("NAME" = "County"))
Res.1 <- Map.Me %>% 
  filter(date==Sys.Date()) %>% 
  mutate(Percent.Positive = Number.of.cases / Negative.test.results, 
         TTip = paste0("Percent Positive: ", Percent.Positive, "/n Positive Tests: ", Number.of.cases, "/n Negative Tests: ", Negative.test.results, "/n Deaths: ", Deaths))

Summarising the County Data: An Animation

I will use a ggplot to build Oregon’s map and fill it with case numbers. geom_sf() plots the geometry. I have to repel the labels to prevent print overlap, and scheme the colors. The last step is to animate it by time. If I stop the label repelling, they will not bounce but it adds excitement. There is a separate page of daily maps.

Anim1 <- Map.Me %>% 
  ggplot(., aes(geometry=geometry, fill=Number.of.cases, label=NAME, group=NAME)) + 
  geom_sf() +
  geom_label_repel(stat = "sf_coordinates",
    min.segment.length = 0,
    colour = "white",
    segment.colour = "white",
    size = 3,
    box.padding = unit(0.05, "lines"))  + scale_fill_continuous_tableau("Red") + theme_minimal() + labs(title="COVID-19 in Oregon", subtitle="{frame_time}", x="", y="", caption="Made with R, ggplot2, and ggrepel by @PieRatio \n data: https://govstatus.egov.com/OR-OHA-COVID-19") + transition_time(date)
ResA <- animate(Anim1, end_pause=100, nframes=200)
ResA

Percent Positive by State

library(plotly)
Res1 <- Res.1 %>% 
  mutate(Percent.Positive = Number.of.cases / Negative.test.results) %>%
  filter(Negative.test.results > 10) %>% 
  ggplot(., aes(geometry=geometry, fill=Percent.Positive, label=NAME, text=TTip)) + 
   geom_sf() + 
   scale_fill_viridis_c(option = "A") + 
   theme_map() + 
   labs(title="Positive COVID-19 Tests in Oregon", subtitle=paste0(Sys.Date(),": \nonly counties with 10+ tests shown."), x="", y="", caption="Made with R, ggplot2, and ggrepel by @PieRatio \n data: https://govstatus.egov.com/OR-OHA-COVID-19", fill="Percent Positive")
ggplotly(Res1)

Testing in Oregon

What is the cumulative status of testing? These data, updated daily, render a count of all tests performed since January of 2020.

Oregon.Tests <- Oregon.Tests %>% filter(Category!="Pending")
OR.Testing <- Oregon.Tests %>% group_by(date) %>% summarise(Total = sum(Outcome, na.rm=TRUE))
Test1 <- Oregon.Tests %>% 
  ggplot(.) + 
  aes(x=date, y=Outcome, fill=Category, label=Outcome) + 
  geom_col() + 
  geom_label(col="magenta", show.legend = FALSE, vjust=1, size=2.5) + 
  scale_fill_viridis_d() + 
  labs(title="Cumulative [since January 2020] COVID-19 Testing in Oregon", y="Completed Tests", x="Date", subtitle="Total appears in black and white", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19") + 
  geom_label(data=OR.Testing, aes(x=date, y=Total+100, label=Total), fill="white", color="black", inherit.aes = FALSE, vjust = 0, size=2.5)
Test1

A Second Look at Testing

We should worry about the degree to which tests are applied at all randomly; they are surely not. As a result, what it means to calculate or break down probability based on having been tested is subject to so many important caveats that vary over time. It is important to note that this is also cumulative test percentages/proportions.

Test2 <- Oregon.Tests %>% ggplot(.) + aes(x=date, y=Outcome, fill=Category, label=Outcome) + geom_col(position = "fill") + scale_fill_viridis_d() + labs(title="Cumulative Outcomes of COVID-19 Testing in Oregon", y="Proportion of Tests", x="Date", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19")
Test2

A Better Look at This

Cumulative results are perhaps more informative when viewed alongside the marginal changes from day to day. Here, I show the proportion of test outcomes just considering those test results reported for that day – the daily distribution of test outcomes.

library(hrbrthemes)
MyDat <- Oregon.Tests %>%
  pivot_wider(., names_from = Category, values_from = Outcome) %>% 
  mutate(Positive.Chg = Positive - lag(Positive, order_by = date), 
         Negative.Chg = Negative - lag(Negative, order_by = date)) %>% 
  mutate(Total = Positive.Chg + Negative.Chg) %>% 
  mutate(Positive.Pct = round(Positive.Chg / Total, digits=3)) %>% 
  select(date, Positive.Chg, Negative.Chg, Positive.Pct) %>% 
  pivot_longer(., c(Negative.Chg, Positive.Chg), names_to = "Outcome", values_to = "Test.Changes")
MyDat %>% 
  ggplot(., aes(x=date, y=Test.Changes, fill=Outcome)) + 
  geom_col(position="fill") + 
  geom_label(data=MyDat, aes(x=date, y=Positive.Pct, label=Positive.Pct), size=2.5, inherit.aes = TRUE, show.legend=FALSE) + 
  scale_fill_ipsum() + 
  labs(title="Daily Distribution of Test Outcomes", subtitle="Percent of Positive Tests Shown")

Hospitalization Data for Oregon

How do positive cases and hospitalization relate?

Hos1 <- OR.Hosp %>% 
  filter(Hospitalized.!="Total") %>% 
  ggplot(., aes(x=date, y=Number.of.cases,fill=Hospitalized.)) + 
  geom_col() + 
  scale_fill_ipsum() + 
  labs(x="Date", y="Number of COVID Positives", fill="Hospitalized?", title="Hospitalization and /n COVID-19 in Oregon") + 
  coord_flip()
Hos2 <- OR.Hosp %>% 
  filter(Hospitalized.!="Total") %>% 
  ggplot(., aes(x=date, y=Number.of.cases,fill=Hospitalized.)) + 
  geom_col(position="fill") + 
  scale_fill_ipsum() + 
  labs(x="Date", y="Number of COVID Positives", fill="Hospitalized?") + 
  coord_flip()
Hos1 + Hos2

Age Data

How does this break down by age? I had to break this into two parts. They changed the classification method on March 25, 2020.

OR.Ages1 <- OR.Ages %>% 
  filter(date < "2020-03-25")
Age1 <- OR.Ages1 %>% 
  ggplot(., aes(x=date, y=Number.of.cases, fill=Age.group)) + 
  geom_col() + 
  scale_fill_ipsum() + 
  labs(x="Date", y="Number of COVID-19 Positives") + 
  theme_economist() + 
  guides(fill=FALSE)
Age2 <- OR.Ages1 %>% ggplot(.) +
 aes(x = date, fill = Age.group, weight = Number.of.cases) +
 geom_bar(position = "fill") +
 scale_fill_ipsum() + labs(x="Date", y="Proportion of COVID-19 Positives") +
 theme_minimal()
Age1 + Age2

The Revision

OR.Ages1 <- OR.Ages %>% filter(date > "2020-03-24")
Age1 <- OR.Ages1 %>% 
  ggplot(., aes(x=date, y=Number.of.cases, fill=Age.group)) + geom_col() +
  scale_fill_viridis(discrete = TRUE) + labs(x="Date", y="Number of COVID-19 Positives") +
  theme_economist() + 
  theme(axis.text.x = element_text(size=8, angle=45)) +
  guides(fill=FALSE)
Age2 <- OR.Ages1 %>% ggplot(.) +
 aes(x = date, fill = Age.group, weight = Number.of.cases) +
 geom_bar(position = "fill") +
 scale_fill_viridis(discrete=TRUE) + labs(x="Date", y="Percent of COVID-19 Positives", fill="Age Group") +
 theme_economist() + 
 theme(axis.text.x = element_text(size=8, angle=45))
Age1 + Age2

Males and Females

From the outset, there have been more cases reported for females than males.

library(hrbrthemes)
OR.Sex.Reshape <- OR.Gender %>% 
  filter(Sex %in% c("Female","Male")) %>% 
  mutate(Actives = Cases - Deaths) %>% 
  select(-Cases) %>%  
  pivot_longer(., c(Actives,Deaths), names_to = "Outcome", values_to  = "Count")
OR.Sex.Reshape %>% 
  mutate(Category = paste(Sex, Outcome, sep=":")) %>% 
  ggplot(., aes(x=date, y=Count, fill=Outcome)) + 
  geom_col(position="stack") + 
  scale_fill_ipsum() + 
  labs(x="Date", y="Cases", title="Cumulative Deaths and COVID-19 Cases in Oregon by Sex") +
  facet_wrap(vars(Sex))

I do not know to what extent that mirrors the general trend that men are more likely to suffer serious cases. The death rate data seem to highlight that.

OR.Gender %>% 
  filter(Sex%in%c("Male","Female")) %>% 
  mutate(CFR = Deaths / Cases) %>% 
  ggplot(., aes(x=date, y=CFR, color=Sex)) + 
  geom_point() + 
  geom_line() + 
  theme_ipsum_rc() +
  labs(x="Date", y="Case Fatality Ratio", title="Oregon's Cumulative Case Fatality Ratio by Sex")

Hospital Capacity Data

The reported entries here vary a lot over time; they seem not to have figured out exactly what they wish to report.

OR.Hospital.Caps <- OR.Hospital.Caps %>% 
  mutate(Capacity = Hospital.Capacity) %>% 
  filter(Capacity != "ICU beds") %>% 
  filter(Capacity != "Non-ICU beds")
OR.Hospital.Caps  %>% ggplot() +
 aes(x = date, y = Number) +
 geom_col(position = "dodge", fill="purple") +
 theme_economist() +
 facet_wrap(vars(Capacity), scales = "free_y")

Analysis over Time

Admissions

OR.COVID.Strain %>% filter(COVID.19.Details=="COVID-19 admissions") %>% ggplot(., aes(x=date, y=Number)) + geom_point() + geom_line() + labs(title="COVID-19 Admissions", x="Date", y="Count", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19 \n By: @PieRatio") + theme_ipsum() + theme(axis.text.x = element_text(angle = 90))

Ventilators

Vent1 <- OR.Hospital.Caps %>% 
  filter(str_detect(Capacity, "Ventilators")) %>% 
  filter(Type!="Total") %>% 
  select(-Hospital.Capacity)
Vent2 <- OR.COVID.Strain %>% 
  filter(str_detect(COVID.19.Details, "ventilators")) %>% 
  mutate(Capacity = COVID.19.Details) %>% 
  select(-COVID.19.Details)
Vents <- bind_rows(Vent1,Vent2)
Vents %>% 
  ggplot(., aes(x=date, y=Number)) + 
  geom_point() + 
  geom_line() + 
  labs(title="Ventilators in Oregon", x="Date", y="Count", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19 \n By: @PieRatio") + 
  theme_ipsum() + theme(axis.text.x = element_text(angle = 90)) + 
  facet_wrap(vars(Capacity), scales = "free_y")

Beds

OR.Hospital.Caps %>% 
  filter(Capacity!="Ventilators") %>% 
  filter(Type!="Total") %>% 
  ggplot(., aes(x=date, y=Number)) + 
  geom_point() + 
  geom_line() + 
  labs(title="Available Beds in Oregon", x="Date", y="Count", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19 \n By: @PieRatio") + 
  theme_ipsum() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  facet_wrap(vars(Capacity), scales = "free_y")